home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga Plus Special 17
/
AMIGAplus Sonderheft 17 (1999)(ICP)(DE)[!].iso
/
PD
/
Anwendungen
/
-DataTypes-
/
mpegvideo
/
mpeg2enc.doc
< prev
next >
Wrap
Text File
|
1998-06-17
|
25KB
|
655 lines
mpeg2encode
===========
MPEG-2 Video Encoder, Version 1.1, June 1994
MPEG Software Simulation Group
(MPEG-L@netcom.com)
Contents:
=========
1. Overview
2. Features and Limitations
3. Usage
4. Interpreting the status information
5. Description of the Encoder Model
6. References
1. Overview
===========
This is the second public release of our MPEG-2 encoder. It converts an
ordered set of uncompressed input pictures into a compressed bitstream
compliant with ISO/IEC 13818-2 DIS [1] (MPEG-2).
This program will evolve to become: ISO/IEC 13818-5 Software Simulation
of MPEG-2 Systems, Video, and Audio.
2. Features and Limitations
===========================
2.1 Features
- generates constant bit rate streams
- encoder model based on MPEG-2 test model 5 (TM5) rev. 2 [2]
- progressive and interlaced video
- also generates ISO/IEC 11172-2 (MPEG-1) sequences
- input formats: separate YUV, combined YUV, PPM
- trace and statistics output
- verifies user parameter settings are legal within Profile and Level
2.2 Current limitations
The encoder currently does not support
- scalable extensions
- MPEG-1 integer pel vectors (half-pel is typically 1 dB better anyway)
and D frame sequences.
- checking for maximum number of generated bits per macroblock
- automatic 3:2 pulldown detection or irregular 3:2 pulldown signaling.
- intra refresh slices (e.g. low delay)
- concealment motion vectors
- special low delay mode rate control
- editing of encoded video
- variable bit rate encoding
- scene change rate control
3. Usage
========
The execution template for the encoder is:
mpeg2encode parameter_file output.m2v
Coding parameters can be modified by editing the parameter_file. Since the
parser expects the operating parameters to be on certain line numbers,
kindly do not insert or delete lines from the file.
We have provided a couple of sample parameter files in the par directory.
It is recommended to use the one closest to your application as a starting
point for customization.
The first line of the parameter file is a comment which is inserted near
the beginning of the MPEG bitstream as a user_data field, and can be
used for arbitrary purposes.
The remaining lines are described below:
/* name of source frame files */
A printf format string defining the name of the input files. It has to
contain exactly one numerical descriptor (%d, %x etc.):
Example: frame%02d
Then the encoder looks for files: frame00, frame01, frame02
The encoder adds an extension (.yuv, .ppm, etc.) which depends on the
input file format. Input files have to be in frame format, containing
two interleaved fields (for interlaced video).
/* name of reconstructed frame files */
This user parameter tells the encoder what name to give the reconstructed
frames. These frames are identical to frame reconstructions of decoders
following normative guidelines (except of course for differences caused by
different IDCT implementation). Specifying a name starting with - (or just
- by itself) disables output of reconstructed frames.
The reconstructed frames are always stored in Y,U,V format (see below),
independent of the input file format.
/* name of intra quant matrix file ("-": default matrix) */
Setting this to a value other than - specifies a file containing
a custom intra quantization matrix to be used instead of the default
matrix specified in ISO/IEC 13818-2 and 11172-2. This file has to contain
64 integer values (range 1...255) separated by white space (blank, tab,
or newline), one corresponding to each of the 64 DCT coefficients. They
are ordered line by line, i.e. v-u frequency matrix order (not by the
zig-zag pattern used for transmission). The file intra.mat contains the
default matrix as a starting point for customization. It is neither
necessary or recommended to specify the default matrix explicitly.
Large values correspond to coarse quantization and consequently more
noise at that particular spatial frequency.
For the intra quantization matrix, the first value in the file (DC value)
is ignored. Use the parameter intra_dc_precision (see below) to define
the quantization of the DC value.
/* name of non intra quant matrix file ("-": default matrix) */
This parameter field follows the same rules as described for the above
intra quant matrix parameter, but specifies the file for the NON-INTRA
coded (predicted / interpolated) blocks. In this case the first
coefficient of the matrix is NOT ignored.
The default matrix uses a constant value of 16 for all 64 coefficients.
(a flat matrix is thought to statistically minimize mean square error).
The file inter.mat contains an alternate matrix, used in the MPEG-2 test
model.
/* name of statistics file */
Statistics output is stored into the specified file. - directs statistics
output to stdout.
/* input picture file format */
A number defining the format of the source input frames.
Code Format description
---- ------------------------------------------------------------------
0 separate files for luminance (.Y extension), and chrominance (.U, .V)
all files are in headerless 8 bit per pixel format. .U and .V must
correspond to the selected chroma_format (4:2:0, 4:2:2, 4:4:4, see
below). Note that in this document, Cb = U, and Cr = V. This format
is also used in the Stanford PVRG encoder.
1 similar to 0, but concatenated into one file (extension .yuv).
This is the format used by the Berkeley MPEG-1 encoder.
2 PPM, Portable PixMap, only the raw format (P6) is supported.
/* number of frames */
This defines the length of the sequence in integer units of frames.
/* number of first frame */
Usually 0 or 1, but any other (positive) value is valid.
/* timecode of first frame */
This line is used to set the timecode encoded into the first 'Group of
Pictures' header. The format is based on the SMPTE style:
hh:mm:ss:ff (hh=hour, mm=minute, ss=second, ff=frame (0..picture_rate-1)
/* N (# of frames in GOP) */
This defines the distance between I frames (and 'Group of Pictures'
headers). Common values are 15 for 30 Hz video and 12 for 25 Hz video.
/* M (I/P frame distance) */
Distance between consecutive I or P frames. Usually set to 3.
N has to be a multiple of M. M = 1 means no B frames in the sequence.
(in a future edition of this program, M=0 will mean only I frames).
/* ISO/IEC 11172-2 stream */
Set to 1 if you want to generate an MPEG-1 sequence. In this case some
of the subsequent MPEG-2 specific values are ignored.
/* picture format */
0 selects frame picture coding, in which both fields of a frame are coded
simultaneously, 1 select field picture coding, where fields are coded
separately. The latter is permitted for interlaced video only.
/* horizontal_size */
Pixel width of the frames. It does not need to be a multiple of 16.
You have to provide a correct value even for PPM files (the PPM
file header is currently ignored).
/* vertical_size */
Pixel height of the frames. It does not need to be a multiple of 16.
You have to provide a correct value even for PPM files (the PPM file
header is currently ignored).
/* aspect_ratio_information */
Defines the display aspect ratio. Legal values are:
Code Meaning
---- --------------
1 square pels
2 4:3 display
3 16:9 display
4 2.21:1 display
MPEG-1 uses a different coding of aspect ratios. In this cases codes
1 to 14 are valid.
/* frame_rate_code */
Defines the frame rate (for interlaced sequences: field rate is twice
the frame rate). Legal values are:
Code Frames/sec Meaning
---- ---------- -----------------------------------------------
1 24000/1001 23.976 fps -- NTSC encapsulated film rate
2 24 Standard international cinema film rate
3 25 PAL (625/50) video frame rate
4 30000/1001 29.97 -- NTSC video frame rate
5 30 NTSC drop-frame (525/60) video frame rate
6 50 double frame rate/progressive PAL
7 60000/1001 double frame rate NTSC
8 60 double frame rate drop-frame NTSC
/* bit_rate */
A positive floating point value specifying the target bitrate.
In units of bits/sec.
/* vbv_buffer_size (in multiples 16 kbit) */
Specifies, according to the Video Buffering Verifier decoder model,
the size of the bitstream input buffer required in downstream
decoders in order for the sequence to be decoded without underflows or
or overflows. You probably will wish to leave this value at 112 for
MPEG-2 Main Profile at Main Level, and 20 for Constrained Parameters
Bitstreams MPEG-1.
/* low_delay */
When set to 1, this flag specifies whether encoder operates in low delay
mode. Essentially, no B pictures are coded and a different rate control
strategy is adopted which allows picture skipping and VBV underflows.
This feature has not yet been implemented. Please leave at zero for now.
/* constrained_parameters_flag */
Always 0 for MPEG-2. You may set this to 1 if you encode an MPEG-1
sequence which meets the parameter limits defined in ISO/IEC 11172-2
for constrained parameter bitstreams:
horizontal_size <= 768
vertical_size <= 576
picture_area <= 396 macroblocks
pixel_rate <= 396x25 macroblocks per second
vbv_buffer_size <= 20x16384 bit
bitrate <= 1856000 bits/second
motion vector range <= -64...63.5
/* Profile ID */
Specifies the subset of the MPEG-2 syntax required for decoding the
sequence. All MPEG-2 sequences generated by the current version of
the encoder are either Main Profile or Simple Profile sequences.
Code Meaning Typical use
---- -------------------------- ------------------------
1 High Profile production equipment requiring 4:2:2
2 Spatially Scalable Profile Simulcasting
3 SNR Scalable Profile Simulcasting
4 Main Profile 95 % of TVs, VCRs, cable applications
5 Simple Profile Low cost memory, e.g. no B pictures
/* Level ID */
Specifies coded parameter constraints, such as bitrate, sample rate, and
maximum allowed motion vector range.
Code Meaning Typical use
---- --------------- -----------------------------------------------
4 High Level HDTV production rates: e.g. 1920 x 1080 x 30 Hz
6 High 1440 Level HDTV consumer rates: e.g. 1440 x 960 x 30 Hz
8 Main Level CCIR 601 rates: e.g. 720 x 480 x 30 Hz
10 Low Level SIF video rate: e.g. 352 x 240 x 30 Hz
/* progressive_sequence */
0 in the case of a sequences containing interlaced video (e.g.
video camera source), 1 for progressive video (e.g. film source).
/* chroma_format */
Specifies the resolution of chrominance data
Code Meaning
---- ------- ----------------------------------------
1 4:2:0 half resolution in both dimensions (most common format)
2 4:2:2 half resolution in horizontal direction (High Profile only)
3 4:4:4 full resolution (not allowed in any currently defined profile)
/* video_format: 0=comp., 1=PAL, 2=NTSC, 3=SECAM, 4=MAC, 5=unspec. */
/* color_primaries */
Specifies the x, y chromaticity coordinates of the source primaries.
Code Meaning
---- -------
1 ITU-R Rec. 709 (1990)
2 unspecified
4 ITU-R Rec. 624-4 System M
5 ITU-R Rec. 624-4 System B, G
6 SMPTE 170M
7 SMPTE 240M (1987)
/* transfer_characteristics */
Specifies the opto-electronic transfer characteristic of the source picture.
Code Meaning
---- -------
1 ITU-R Rec. 709 (1990)
2 unspecified
4 ITU-R Rec. 624-4 System M
5 ITU-R Rec. 624-4 System B, G
6 SMPTE 170M
7 SMPTE 240M (1987)
8 linear transfer characteristics
/* matrix_coefficients */
Specifies the matrix coefficients used in deriving luminance and chrominance
signals from the green, blue, and red primaries.
Code Meaning
---- -------
1 ITU-R Rec. 709 (1990)
2 unspecified
4 FCC
5 ITU-R Rec. 624-4 System B, G
6 SMPTE 170M
7 SMPTE 240M (1987)
/* display_horizontal_size */
/* display_vertical_size */
Display_horizontal_size and display_vertical_size specify the "intended
display's" active region (which may be smaller or larger than the
encoded frame size).
/* intra_dc_precision */
Specifies the effective precision of the DC coefficient in MPEG-2
intra coded macroblocks. 10-bits usually achieves quality saturation.
Code Meaning
---- -----------------
0 8 bit
1 9 bit
2 10 bit
3 11 bit
/* top_field_first */
Specifies which of the two fields of an interlaced frame comes earlier.
The top field corresponds to what is often called the "odd field," and
the bottom field is also sometimes called the "even field."
Code Meaning
---- -----------------
0 bottom field first
1 top field first
/* frame_pred_frame_dct (I P B) */
Setting this parameter to 1 restricts motion compensation to frame
prediction and DCT to frame DCT. You have to specify this separately for
I, P and B picture types.
/* concealment_motion_vectors (I P B) */
Setting these three flags informs encoder whether or not to generate
concealment motion vectors for intra coded macroblocks in the
three respective coded picture types. This feature is mostly useful
in Intra-coded pictures, but may also be used in low-delay applications
(which attempts to exclusively use P pictures for video signal refresh,
saving the time it takes to download a coded Intra picture across a
channel). concealment_motion_vectors in B pictures are rather pointless
since there is no error propagation from B pictures. This feature is
currently not implemented. Please leave values at zero.
/* q_scale_type (I P B) */
These flag sets linear (0) or non-linear (1) quantization scale type
for the three respective picture types.
/* intra_vlc_format (I P B) */
Selects one of the two variable length coding tables for intra coded blocks.
Table 1 is considered to be statistically optimized for Intra coded
pictures coded within the sweet spot range (e.g. 0.3 to 0.6 bit/pixel)
of MPEG-2.
Code Meaning
---- -----------------
0 table 0 (= MPEG-1)
1 table 1
/* alternate_scan (I P B) */
Selects one of two entropy scanning patterns defining the order in
which quantized DCT coefficients are run-length coded. The alternate
scanning pattern is considered to be better suited for interlaced video
where the encoder does not employ sophisticated forward quantization
(as is the case in our current encoder).
Code Meaning
---- -----------------
0 Zig-Zag scan (= MPEG-1)
1 Alternate scan
/* repeat_first_field */
If set to one, the first field of a frame is repeated after the second by
the display process. The exact function depends on progressive_sequence
and top_field_first. repeat_first_field is mainly intended to serve as
a signal for the Decoder's Display Process to perform 3:2 pulldown.
/* progressive_frame */
Specifies whether the frames are interlaced (0) or progressive (1).
MPEG-2 permits mixing of interlaced and progressive video. The encoder
currently only supports either interlaced or progressive video.
progressive_frame is therefore constant for all frames and usually
set identical to progressive_sequence.
/* intra_slice refresh picture period (P factor) */
This value indicates the number of successive P pictures in which
all slices (macroblock rows in our encoder model) are refreshed
with intra coded macroblocks. This feature assists low delay mode
coding. It is currently not implemented.
/* rate control: r (reaction parameter) */
/* rate control: avg_act (initial average activity) */
/* rate control: Xi (initial I frame global complexity measure) */
/* rate control: Xp (initial P frame global complexity measure) */
/* rate control: Xb (initial B frame global complexity measure) */
/* rate control: d0i (initial I frame virtual buffer fullness) */
/* rate control: d0p (initial P frame virtual buffer fullness) */
/* rate control: d0b (initial B frame virtual buffer fullness) */
These parameters modify the behavior of the rate control scheme. Usually
set them to 0, in which case default values are computed by the encoder.
/* P: forw_hor_f_code forw_vert_f_code search_width/height */
/* B1: forw_hor_f_code forw_vert_f_code search_width/height */
/* B1: back_hor_f_code back_vert_f_code search_width/height */
/* B2: forw_hor_f_code forw_vert_f_code search_width/height */
/* B2: back_hor_f_code back_vert_f_code search_width/height */
This set of parameters specifies the maximum length of the motion
vectors. If this length is set smaller than the actual movement
of objects in the picture, motion compensation becomes ineffective
and picture quality drops. If it is set too large, an excessive
number of bits is allocated for motion vector transmission, indirectly
reducing picture quality, too.
All f_code values have to be in the range 1 to 9 (1 to 7 for MPEG-1),
which translate into maximum motion vector lengths as follows:
code range (inclusive) max search width/height
================================================
1 -8 ... +7.5 7
2 -16 ... +15.5 15
3 -32 ... +31.5 31
4 -64 ... +63.5 63
5 -128 ... +127.5 127
6 -256 ... +255.5 255
7 -512 ... +511.5 511
8 -1024 ... +1023.5 1023
9 -2048 ... +2047.5 2047
f_code is specified individually for each picture type (P,Bn), direction
(forward prediction, backward prediction) and component (horizontal,
vertical). Bn is the n'th B frame surrounded by I or P frames
(e.g.: I B1 B2 B3 P B1 B2 B3 P ...).
For MPEG-1 sequences, horizontal and vertical f_code have to be
identical and the range is restricted to 1...7.
P frame values have to be specified if N (N = # of frames in GOP) is
greater than 1 (otherwise the sequences contains only I frames).
M - 1 (M = distance between I/P frames) sets (two lines each) of values
have to specified for B frames. The first line of each set defines
values for forward prediction (i.e. from a past frame), the second
line those for backward prediction (from a future frame).
search_width and search_height set the (half) width of the window used
for motion estimation. The encoder currently employs exhaustive
integer vector block matching. Execution time for this algorithm depends
on the product of search_width and search_height and, too a large extent,
determines the speed of the encoder. Therefore these values have to be
chosen carefully.
Here is an example of how to set these values, assuming a maximum
motion of 10 pels per frame in horizontal and 5 pels per frame in
vertical direction and M=3 (I B1 B2 P):
search width / height:
forward hor. vert. backward hor. vert.
I -> B1 10 5 B1 <- P 20 10
I -> B2 20 10 B2 <- P 10 5
I -> P 30 15
f_code values are then selected as the smallest ones resulting in a range
larger than the search widths / heights:
3 2 30 15 /* P: forw_hor_f_code forw_vert_f_code search_width/height */
2 1 10 5 /* B1: forw_hor_f_code forw_vert_f_code search_width/height */
3 2 20 10 /* B1: back_hor_f_code back_vert_f_code search_width/height */
3 2 20 10 /* B2: forw_hor_f_code forw_vert_f_code search_width/height */
2 1 10 5 /* B2: back_hor_f_code back_vert_f_code search_width/height */
4. Interpreting the status information
======================================
4.1 Description of the macroblock_type map
The status information routine prints a two-character code for
each macroblock in the picture currently being coded:
First character:
S: skipped
I: intra coded
0: forward prediction without motion compensation
F: forward frame/16x8 prediction (in frame/field pictures resp.)
f: forward field prediction
p: dual prime prediction
B: backward frame/16x8 prediction
b: backward field prediction
D: frame/16x8 interpolation
d: field interpolation
Second character:
space: coded, no quantization change
Q: coded, quantization change
N: not coded (only predicted)
4.2 Description of the mquant map
The mquant map displays the dynamically changing quantization
parameter as determined by the rate control algorithm. Larger
values correspond to coarser quantization. Not displayed values
indicate same quantization as in the previous macroblock.
Note that for for MPEG-1 sequences the displayed values are twice
as large as the quant_scale parameter defined in ISO 11172-2.
5. Encoder model
================
The encoder model describes the "algorithm" or "intelligence" of the
encoder.
5.1 Motion estimation
Exhaustive search pattern with L1 (Minimum Absolute Difference) matching
criteria on the original (non-coded) reference frame. Performed only
on luminance samples. Half pel search is done on a neighborhood of 8
bi-linearly interpolated luminance samples from the reconstructed
reference frame.
5.2 Rate control and adaptive quantization
The encoder adheres to a single-pass coding philosophy (no a priori
measurements guide the allocation of bits at the global layers).
Target bit allocation for the current picture being encoded is based
on global bit budget for the Group of Pictures (dependently coded
sequence of pictures), and a ratio of weighted relative coding
complexities of the three picture types. Coding complexity (X) is
estimated as a picture's product of: average macroblock quantization
step size (Q) and the number of bits (S) generated by coding the picture
(e.g. X = Q*S).
Local bit allocation is based on two measurements:
1. deviancy from estimated buffer fullness for the Nth macroblock
2. normalized spatial activity
The picture is approximated and estimated to have a uniform distribution
of bits. If the local trend of generated bits begin to stray from this
estimation, a compensation factor emerges to modulate the macroblock
quantization scale (mquant). The compensation factor is the difference
between where the buffer fullness was predicted at the Nth macroblock
and where the buffer fullness truly is)
Local spatial activity is estimated as the minimum variance of the
four luminance 8x8 frame blocks and the corresponding four luminance 8x8
field blocks of the original picture being coded. The variance measure
is then normalized against the average variance of the most recently coded
picture. The scaled product of the local activity measure and the buffer
compensation factor are combined to modify the final mquant value which
will be used in the quantization stage of the subsequent block coding
processes.
5.3 Decision
The macroblock decision process selects the various prediction
switches (yes/no coding of prediction error, yes/no motion
compensation) which provide the best quality for the relative coding
bit cost. The decision process follows the piecewise path of least
distortion, approximating that the decision stages (whether to use
forwards, backwards compensation or none at all..., etc.) are mostly
uncorrelated (i.e. can be independently decided).
5.4 Macroblock coding
The forward DCT is based on the fast Chen-Wang 1984 algorithm.
Quantization is a mirror of the normative inverse quantizer, as
specified in the MPEG-2 Test Model. No special dithering or
entropy-constrained methods are currently employed.
6. References
=============
[1] ISO/IEC 13818 Draft International Standard: Generic Coding of
Moving Pictures and Associated Audio, Part 2: video
[2] Test Model 5, ISO/IEC JTC1/SC29/WG11/ N0400, MPEG93/457, April 1993.